18 research outputs found

    On optimally partitioning a text to improve its compression

    Full text link
    In this paper we investigate the problem of partitioning an input string T in such a way that compressing individually its parts via a base-compressor C gets a compressed output that is shorter than applying C over the entire T at once. This problem was introduced in the context of table compression, and then further elaborated and extended to strings and trees. Unfortunately, the literature offers poor solutions: namely, we know either a cubic-time algorithm for computing the optimal partition based on dynamic programming, or few heuristics that do not guarantee any bounds on the efficacy of their computed partition, or algorithms that are efficient but work in some specific scenarios (such as the Burrows-Wheeler Transform) and achieve compression performance that might be worse than the optimal-partitioning by a Ω(log⁥n)\Omega(\sqrt{\log n}) factor. Therefore, computing efficiently the optimal solution is still open. In this paper we provide the first algorithm which is guaranteed to compute in O(n \log_{1+\eps}n) time a partition of T whose compressed output is guaranteed to be no more than (1+Ï”)(1+\epsilon)-worse the optimal one, where Ï”\epsilon may be any positive constant

    Blockchained Post-Quantum Signatures

    Get PDF
    Inspired by the blockchain architecture and existing Merkle tree based signature schemes, we propose BPQS, an extensible post-quantum (PQ) resistant digital signature scheme best suited to blockchain and distributed ledger technologies (DLTs). One of the unique characteristics of the protocol is that it can take advantage of application-specific chain/graph structures in order to decrease key generation, signing and verification costs as well as signature size. Compared to recent improvements in the field, BPQS outperforms existing hash-based algorithms when a key is reused for reasonable numbers of signatures, while it supports a fallback mechanism to allow for a practically unlimited number of signatures if required. To our knowledge, this is the first signature scheme that can utilise an existing blockchain or graph structure to reduce the signature cost to one OTS, even when we plan to sign many times. This makes existing many-time stateful signature schemes obsolete for blockchain applications. We provide an open source implementation of the scheme and benchmark it

    Parsing Algorithms for Data Compression

    No full text
    The task of parsing consists of splitting an input text into a sequence of contiguous phrases. Several classes of data compression algorithms rely on parsing techniques either to identify repetitions inside the input string (dictionary-based compression) or to locate homogeneous pieces of data which are separately compressed (Permute- Partition-Compress paradigm). In these applications, the choice of the parsing strategy is determinant for the final performance of the compressor. An ideal parsing algorithm should be able to parse the input in a way that minimizes the output-size of the underlying compressor, and the question is how efficiently this can be done. Many investigations have ocused on parsing algorithms that achieve optimality in the compressor’s output-size but the solutions proposed in literature are far from being satisfactory. In fact, most of them are either simple approaches based on dynamic programming with prohibitive time complexities, or heuristic algorithms which do not offer any bounds on the efficacy of the solution. We propose a new approach to the design of optimal parsing algorithms, achieving significant improvements in running time over previous methods. It is well known that this problem can be modeled as a shortest-path computation over a particular directed-acyclic graph. We build upon this idea by showing that the class of graphs arising from this reduction satisfies particular structural properties that can be exploited by our algorithms to speed-up a lot shortest-path computation. We obtain new results by applying this approach to the contexts of dictionary-based compression and Permute-Partition-Compress paradigm. We consider the class of LZ77-based compressors, the most powerful example of dictionary-based compression, and design the first parser which achieve optimality in the compressed output size (measured in bits) by taking efficient/optimal time and optimal space. Then, using similar techniques, we provide an approximate parsing algorithm that, when used inside the Permute-Partition-Compress paradigm, produces a compressed output whose size is guaranteed to be no more than (1 + Δ)-worse the optimal one, where Δ is a user defined constant

    On Compact Representations of All-Pairs-Shortest-Path-Distance Matrices ⋆

    No full text
    Abstract. Let G be an unweighted and undirected graph of n nodes, and let D be the n × n matrix storing the All-Pairs-Shortest-Path distances in G. Since D contains integers in [n]âˆȘ+∞, its plain storage takes n 2 log(n+1) bits. However, a simple counting argument shows that n 2 /2 bits are necessary to store D. In this paper we investigate the question of finding a succinct representation of D that requires O(n 2) bits of storage and still supports constant-time access to each of its entries. This is asymptotically optimal in the worst case, and far from the informationtheoretic lower-bound by a multiplicative factor log 2 3 ≃ 1.585. As a result O(1) bits per pairs of nodes in G are enough to retain constanttime access to their shortest-path distance. We achieve this result by reducing the storage of D to the succinct storage of labeled trees and ternary sequences, for which we properly adapt and orchestrate the use of known compressed data structures.

    On the bit-complexity of Lempel-Ziv compression

    No full text
    One of the most famous and investigated lossless data-compression schemes is the one introduced by Lempel and Ziv about 30 years ago [37]. This compression scheme is known as “dictionary-based compressor ” and consists of squeezing an input string by replacing some of its substrings with (shorter) codewords which are actually pointers to a dictionary of phrases built as the string is processed. Surprisingly enough, although many fundamental results are nowadays known about the speed and effectiveness of this compression process (see e.g. [23, 29] and references therein), “we are not aware of any parsing scheme that achieves optimality when the LZ77-dictionary is in use under any constraint on the codewords other than being of equal length” [29, pag. 159]. Here optimality means to achieve the minimum number of bits in compressing each individual input string, without any assumption on its generating source. In this paper we investigate three issues pertaining to the bit-complexity of LZ-based compressors, and we design algorithms which achieve bit-optimality in the compressed output size by taking efficient/optimal time and optimal space. These theoretical results will be sustained by some experiments that will compare our novel LZ-based compressors against the most popular compression tools (like gzip, bzip2) and state-of-the-art compressors (like the booster of [13, 12])

    Succinct Oracles for Exact Distances in Undirected Unweighted Graphs

    No full text
    Let G be an unweighted and undirected graph of n nodes, and let D be the n x n matrix storing the All-Pairs-Shortest-Path distances in G. Since D contains integers in [n], its plain storage takes n^2log (n + 1) bits. However, a simple counting argument shows that n^2/2 bits are necessary to store D. In this paper we investigate the question of finding a succinct representation of D that requires O(n^2) bits of storage and still supports constant-time access to each of its entries. This is asymptotically optimal in the worst case, and far from the information-theoretic lower-bound by a multiplicative factor log 3 \simeq 1.585. As a result O(1) bits per pairs of nodes in G are enough to retain constant-time access to their shortest-path distance. We achieve this result by reducing the storage of D to the succinct storage of labeled trees and ternary sequences, for which we properly adapt and orchestrate the use of known compressed data structures

    On the bit-complexity of Lempel-Ziv compression

    Get PDF
    One of the most famous and investigated lossless data-compression schemes is the one introduced by Lempel and Ziv about 30 years ago [IEEE Trans. Inform. Theory, 23 (1977), pp. 337--343]. This compression scheme is known as “dictionary-based compressor” and consists of squeezing an input string by replacing some of its substrings with (shorter) codewords which are actually pointers to a dictionary of phrases built as the string is processed. Surprisingly enough, although many fundamental results are nowadays known about the speed and effectiveness of this compression process, “we are not aware of any parsing scheme that achieves optimalityldotsldotsunder any constraint on the codewords other than being of equal length” [N. Rajpoot and C. Sahinalp, in Handbook of Lossless Data Compression, Academic Press, New York, 2002, p. 159]. Here optimality means to achieve the minimum number of bits in compressing each individual input string without any assumption on its generating source. In this paper we investigate some issues pertaining to the bit-complexity of LZ77-based compressors, the most powerful variant of the LZ-compression scheme, and we design algorithms which achieve bit-optimality in the compressed output size by taking efficient/optimal time and optimal space
    corecore